Search CORE

Public Library of Science (PLOS)

Using Unsupervised Patterns to Extract Gene Regulation Relationships for Network Construction

Author: A Ozgur
BJ Stapley
C Blaschke
C Nedellec
C Rodriguez-Penagos
CC van der Eijk
CF Schaefer
D Klein
D Klein
Dongxiao Zhu
E Buyko
Hei-Chia Wang
HM Muller
Hung-Yu Kao
J Saric
J Saric
JH Chiang
K Fundel
L Tanabe
M Huang
R Chowdhary
R Hoffmann
R Jelier
S Kim
S Pyysalo
Shaw-Jenq Tsai
Shuo-Jang Li
T Ono
TK Jenssen
U Hahn
Yi-Tsung Tang
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

BACKGROUND: The gene expression is usually described in the literature as a transcription factor X that regulates the target gene Y. Previously, some studies discovered gene regulations by using information from the biomedical literature and most of them require effort of human annotators to build the training dataset. Moreover, the large amount of textual knowledge recorded in the biomedical literature grows very rapidly, and the creation of manual patterns from literatures becomes more difficult. There is an increasing need to automate the process of establishing patterns. METHODOLOGY/PRINCIPAL FINDINGS: In this article, we describe an unsupervised pattern generation method called AutoPat. It is a gene expression mining system that can generate unsupervised patterns automatically from a given set of seed patterns. The high scalability and low maintenance cost of the unsupervised patterns could help our system to extract gene expression from PubMed abstracts more precisely and effectively. CONCLUSIONS/SIGNIFICANCE: Experiments on several regulators show reasonable precision and recall rates which validate AutoPat's practical applicability. The conducted regulation networks could also be built precisely and effectively. The system in this study is available at http://ikmbio.csie.ncku.edu.tw/AutoPat/

CiteSeerX

Automatic reconstruction of a bacterial regulatory network using Natural Language Processing

Author: AM Cohen
C Friedman
Carlos Rodríguez-Penagos
D Corney
G Demetriou
H Salgado
H Schmid
Heladia Salgado
IM Keseler
Irma Martínez-Flores
J Saric
J Saric
JM Cherry
Julio Collado-Vides
L Grivell
L Hirschman
M Hucka
M Krallinger
M Krallinger
M Scherf
MD Yandell
PD Karp
R Grishman
R Hoffmann
R Rodriguez-Esteban
S Abney
Publication venue: BioMed Central
Publication date: 01/08/2007
Field of study

Abstract Background Manual curation of biological databases, an expensive and labor-intensive process, is essential for high quality integrated data. In this paper we report the implementation of a state-of-the-art Natural Language Processing system that creates computer-readable networks of regulatory interactions directly from different collections of abstracts and full-text papers. Our major aim is to understand how automatic annotation using Text-Mining techniques can complement manual curation of biological databases. We implemented a rule-based system to generate networks from different sets of documents dealing with regulation in <it>Escherichia coli </it>K-12. Results Performance evaluation is based on the most comprehensive transcriptional regulation database for any organism, the manually-curated RegulonDB, 45% of which we were able to recreate automatically. From our automated analysis we were also able to find some new interactions from papers not already curated, or that were missed in the manual filtering and review of the literature. We also put forward a novel Regulatory Interaction Markup Language better suited than SBML for simultaneously representing data of interest for biologists and text miners. Conclusion Manual curation of the output of automatic processing of text is a good way to complement a more detailed review of the literature, either for validating the results of what has been already annotated, or for discovering facts and information that might have been overlooked at the triage or curation stages.</p

Text-mining of PubMed abstracts by natural language processing to create a public knowledge base on molecular mechanisms of bacterial enteropathogens

Author: AK Roos
C Rodriguez-Penagos
David Pot
E Boutet
G Wu
Guy Plunkett
HM Muller
JD Glasner
JD Glasner
JD Glasner
Jeremy D Glasner
JM Greene
Joel Fedorko
John M Greene
Jon Whitmore
M Demerec
M Krallinger
M Riley
Matthew Shaker
Mila Ramos-Santacruz
Nicole T Perna
P Stothard
Panna Shetty
R Hoffman
RD Fleischmann
RK Aziz
S Gama-Castro
S Kim
Sam Zaremba
Thomas Hampton
Y-C Fang
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background The Enteropathogen Resource Integration Center (ERIC; <url>http://www.ericbrc.org</url>) has a goal of providing bioinformatics support for the scientific community researching enteropathogenic bacteria such as <it>Escherichia coli </it>and <it>Salmonella </it>spp. Rapid and accurate identification of experimental conclusions from the scientific literature is critical to support research in this field. Natural Language Processing (NLP), and in particular Information Extraction (IE) technology, can be a significant aid to this process. Description We have trained a powerful, state-of-the-art IE technology on a corpus of abstracts from the microbial literature in PubMed to automatically identify and categorize biologically relevant entities and predicative relations. These relations include: Genes/Gene Products and their Roles; Gene Mutations and the resulting Phenotypes; and Organisms and their associated Pathogenicity. Evaluations on blind datasets show an F-measure average of greater than 90% for entities (genes, operons, etc.) and over 70% for relations (gene/gene product to role, etc). This IE capability, combined with text indexing and relational database technologies, constitute the core of our recently deployed text mining application. Conclusion Our Text Mining application is available online on the ERIC website <url>http://www.ericbrc.org/portal/eric/articles</url>. The information retrieval interface displays a list of recently published enteropathogen literature abstracts, and also provides a search interface to execute custom queries by keyword, date range, etc. Upon selection, processed abstracts and the entities and relations extracted from them are retrieved from a relational database and marked up to highlight the entities and relations. The abstract also provides links from extracted genes and gene products to the ERIC Annotations database, thus providing access to comprehensive genomic annotations and adding value to both the text-mining and annotations systems.</p

GenCLiP: a software program for clustering gene lists by literature profiling and constructing gene co-occurrence networks related to custom keywords

Author: AA Schaffer
BT Alako
C Plake
C Rodriguez-Penagos
D Chaussabel
D Lee
EG Cerami
G Karakiulakis
H Kim
Hui-Yong Tian
Jin Zhao
K Fundel
Kai-Tai Yao
KJ Bussey
LJ Jensen
M Bundschus
M Suderman
MB Eisen
N Daraselia
P Shannon
R Hammamieh
R Hoffmann
R Rubinstein
RT Tsai
S Li
T Ide
TK Jenssen
VK Gajendran
Yi-Bo Zhou
Z Huang
ZF Hu
Zhen-Fu Hu
Zhong-Xi Huang
Publication venue: BioMed Central
Publication date: 01/07/2008
Field of study

Abstract Background Biomedical researchers often want to explore pathogenesis and pathways regulated by abnormally expressed genes, such as those identified by microarray analyses. Literature mining is an important way to assist in this task. Many literature mining tools are now available. However, few of them allows the user to make manual adjustments to zero in on what he/she wants to know in particular. Results We present our software program, GenCLiP (Gene Cluster with Literature Profiles), which is based on the methods presented by Chaussabel and Sher (<it>Genome Biol </it>2002, 3(10):RESEARCH0055) that search gene lists to identify functional clusters of genes based on up-to-date literature profiling. Four features were added to this previously described method: the ability to 1) manually curate keywords extracted from the literature, 2) search genes and gene co-occurrence networks related to custom keywords, 3) compare analyzed gene results with negative and positive controls generated by GenCLiP, and 4) calculate probabilities that the resulting genes and gene networks are randomly related. In this paper, we show with a set of differentially expressed genes between keloids and normal control, how implementation of functions in GenCLiP successfully identified keywords related to the pathogenesis of keloids and unknown gene pathways involved in the pathogenesis of keloids. Conclusion With regard to the identification of disease-susceptibility genes, GenCLiP allows one to quickly acquire a primary pathogenesis profile and identify pathways involving abnormally expressed genes not previously associated with the disease.</p

Extraction of human kinase mutations from literature, databases and genotyping studies

Author: A Baudot
AA Morgan
Alfonso Valencia
AW Burgess
C Greenman
C Ortutay
Carlos Rodriguez-Penagos
CJ Richardson
CJO Baker
D Rebholz-Schuhmann
D Santamaría
F Horn
G Manning
HM Berman
I Shchemelinin
IYS Tam
J Hurst
J Ptacek
JA Ubersax
JG Caporaso
JG Caporaso
Jose MG Izarzugaza
JT den Dunnen
LC Lee
LD Wood
LH Greene
LI Furlong
M Erdogmus
M Huse
M Lesk
Martin Krallinger
P Sanz
R Kanagasabai
R McDonald
R Witte
RD Finn
RE Saunders
RT McDonald
S Bamford
S Yamada
SF Altschul
T Joachims
T Sjöblom
YL Yip
YL Yip
YL Yip
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Large-scale directional relationship extraction and resolution

Author: A Culotta
A Gladki
A Koike
A Yuryev
AB Clegg
C Rodriguez-Penagos
CM Topinka
Cory B Giles
D Zhou
F Rinaldi
F Rinaldi
H Chen
H Jang
H Kim
I Donaldson
IK Ruf
J Ding
J Jiang
JA Mitchell
JC Park
JD Kim
JD Kim
JD Wren
JD Wren
JD Wren
Jonathan D Wren
JP Vaque
K Fundel
K Sagae
LM Juliano
M Bundschus
M Chagoyen
M Huang
M Lease
M Wang
M-C de Marneffe
N Daraselia
P Zweigenbaum
R Bunescu
R Kuffner
RC Bunescu
RT Tsai
S Kim
S Novichkova
TK Jenssen
W Pratt
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Linking genes to literature: text mining, information extraction, and retrieval applications for biology

Author: A Divoli
A Doms
A Mitchell
A Sood
Alfonso Valencia
B Alako
B Carpenter
B Settles
BR Haynes
C Batchelor
C Blaschke
C Nedellec
C Rodriguez-Penagos
C Sneiderman
D Chen
D Chen
D Hanisch
D Koning
D Oliver
D Rebholz-Schuhmann
D Searls
D Wheeler
E Camon
F Couto
F Couto
G Divita
G Gomez-Lopez
G Grimes
G Poulter
H Che
H Liu
H Mangalam
H Shatkay
H Yu
I Iliopoulos
I Sarkar
J Baumgartner
J Caporaso
J Chang
J Chang
J Hakenberg
J Hakenberg
J Lewis
J Tamames
J Wilbur
J Wren
K Frantzi
K Mane
K Tomanek
L Chen
L Hunter
L Smith
L Smith
L Tanabe
Lynette Hirschman
M Ashburner
M Craven
M Errami
M Falagas
M Fattore
M Galperin
M Huang
M Krallinger
M Krallinger
M Krauthammer
M Muin
M Ongenaert
M Porter
M Shultz
M Shultz
M Synnestvedt
M Weeber
MA Andrade
Martin Krallinger
MJ Schuemie
N Okazaki
N Smalheiser
N Smalheiser
P Fontelo
P Leary
P Roberts
Q Tu
R Grishman
R Hoffmann
R Hoffmann
R Kittredge
R Netzel
R Steinbrook
S Altschul
S Brady
S Buckingham
S Douglas
S Nelson
S Staab
T Jenssen
T Shtatland
T Vanhecke
W Baumgartner
W Xuan
W Zhou
W Zhou
Y Fang
Y Yamamoto
Z Harris
Publication venue: BioMed Central
Publication date
Field of study

Efficient access to information contained in online scientific literature collections is essential for life science research, playing a crucial role from the initial stage of experiment planning to the final interpretation and communication of the results. The biological literature also constitutes the main information source for manual literature curation used by expert-curated databases. Following the increasing popularity of web-based applications for analyzing biological data, new text-mining and information extraction strategies are being implemented. These systems exploit existing regularities in natural language to extract biologically relevant information from electronic texts automatically. The aim of the BioCreative challenge is to promote the development of such tools and to provide insight into their performance. This review presents a general introduction to the main characteristics and applications of currently available text-mining systems for life sciences in terms of the following: the type of biological information demands being addressed; the level of information granularity of both user queries and results; and the features and methods commonly exploited by these applications. The current trend in biomedical text mining points toward an increasing diversification in terms of application types and techniques, together with integration of domain-specific resources such as ontologies. Additional descriptions of some of the systems discussed here are available on the internet

OptORF: Optimal metabolic and regulatory perturbations for metabolic engineering of microbial strains

Author: A Tomar
A Varma
AM Feist
AM Feist
AP Burgard
AP Burgard
AR Joyce
C Colijn
C Khosla
C Rodriguez-Penagos
CD Herring
CL Barrett
CR Dittrich
CT Trinh
D Segrè
DA Rodionov
DS Lun
EP Gianchandani
G Gosset
G Zhao
GP Ferguson
H Alper
H Alper
J Baumbach
J Zhu
Jennifer L Reed
JF Moxley
JH Park
JJ Faith
JL Reed
Joonhoon Kim
K Patil
MA Asadollahi
MW Covert
MW Covert
P Pharkya
P Pharkya
PF Suthers
R Schuetz
RB Helling
RB Hespell
S Atsumi
S Zhang
SS Fong
SS Fong
SS Fong
SS Fong
SS Levanon
T Baba
T Shlomi
T Shlomi
T Shlomi
T Wang
TB Causey
TS Gardner
Y Kim
Y Kim
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study